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Many recent analyses have indicated that large scale WMAP data display anomalies that appear 
inconsistent with the standard cosmological paradigm. However, the effects of foreground contam- 
ination, which require elimination of some fraction of the data, have not been fully investigated 
due to the complexity in the analysis. Here we develop a general formalism of how to incorporate 
these effects in any analysis of this type. Our approach is to compute the full multi-dimensional 
probability distribution function of all possible sky realizations that are consistent with the data 
and with the allowed level of contamination. Any statistic can be integrated over this probabil- 
ity distribution to assess its significance. As an example we apply this method to compute the 
joint probability distribution function for the possible realizations of quadrupole and octopole using 
the WMAP data. This 12 dimensional distribution function is explored using the Markov Chain 
Monte Carlo technique. The resulting chains are used to asses the statistical significance of the 
low quadrupole using frequentist methods, which we find to be 3-4%. Octopole is normal and the 
probability of it being anomalously low or as low as WMAP reported value is very small. We 
address the quadrupole-octopole alignment using several methods that have been recently used to 
argue for anomalies, such as angular momentum dispersion, multipole vectors and a new method 
based on feature matching. While we confirm that the full sky map ILC suggest an alignment, we 
find that removing the most contaminated part of the data also removes any evidence of alignment: 
the probability distributions strongly disfavor the alignment. This suggests that most of the evi- 
dence for it comes from non-Gaussian features in the part of the data most contaminated by the 
foregrounds. We also present an example, that of octopole alignment with the ecliptic, where the 
statistical significance can be enhanced by removing the contamination. 

PACS numbers: 98.70.Vc 



I. INTRODUCTION 

The large scale structure of the WMAP data has 
received a lot of attention since the first data release. 
Some authors have focused on the seemingly low values 
of quadrupole and octopole 0, 0, 0, H , and their align- 
ment |a, |6( , while others considered various asymmetries 
in the data 0, H, El E| • Some of these analyses are per- 
formed on one of the available full sky WMAP maps, 
either the original WMAP Internal Linear Combination 
Map (ILC) or an alternative map Q, which we will refer 
to as TOH map. The full sky maps have the advantage 
that the harmonic analysis is unique, which facilitates in- 
vestigation and assessment of statistical significance. We 
note that p| prudently warn against the usage of the 
foreground corrected full-sky maps and use appropriate 
Monte-Carlo simulations. 

However, full sky maps are not free of contamination, 
as was clearly emphasized by the WMAP team warn- 
ing that their ILC map should not be used for science 
purposes. These are dominated by galactic foregrounds 
such as dust, synchrotron or free- free emission. For this 
reason the power spectrum analysis is done on cut sky, 
where about 15-25% of most contaminated data in the 
galactic plane is removed. Even outside this region there 
are residual uncertainties associated with imperfections 
of the foreground removal. If ignored they may cause 



spurious alignments or other anomalies that appear sta- 
tistically significant under the assumption that CMB is 
a Gaussian random field. 

In this paper we revisit the statistical significance of 
these tests using a different approach. Rather than ig- 
noring the effects of foregrounds we try to take them into 
account explicitly by exploring the uncertainties they in- 
duce in the measurements of the multipole moments a( m . 
We assess this uncertainty by determining the joint multi- 
dimensional probability distribution function for the true 
sky multipole moments. Once these are determined we 
can apply them to the statistics of choice to obtain their 
values in the presence of uncertainties associated with 
imperfect foreground removal or sky cuts. We do not 
address the question of the meaning of a given statistic: 
all of the statistics are a-posteriori and their statistical 
significance is difficult to assess. Instead, our goal is to 
compare the values of these statistics with and without 
the inclusion of foreground uncertainties to see if includ- 
ing the latter changes the conclusions significantly. 

In this paper we are interested in large scale features, 
so we focus exclusively on quadrupole and octopole. This 
allows us to perform several tests. First, we can revisit 
the question of whether the quadrupole and octopole are 
low in the presence of additional uncertainties associated 
with the foregrounds and sky cuts. Secondly, we can test 
how robust results of methods for measuring the align- 
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ment of quadrupole and octopole are, once these uncer- 
tainties are taken into account. We do this by applying 
several of existing statistics, as well as a new one we de- 
veloped. Finally, we also explore the alignment of the 
large scale features with specific directions in the sky, 
such as that of ecliptic plane. 



II. METHOD 

We need to distinguish between two types of uncer- 
tainties. The theory of CMB fluctuations can only pre- 
dict ensemble averages of the power spectrum of CMB 
fluctuations. Given the statistical description of initial 
fluctuations in the form of a power spectrum and param- 
eters of a given cosmological model it is only possible to 
predict the power spectrum of the CMB fluctuations av- 
eraged over all possible realizations of the universe. Since 
there is only one CMB sky that we can directly measure, 
there is a corresponding uncertainty in our inference of 
the true cosmic power spectrum. This uncertainty is of- 
ten referred to as cosmic variance. 

In terms of multipole moments ag m , for an idealized 
experiment with full sky coverage and no foreground con- 
tamination or noise, their values can be determined pre- 
cisely In this case the only uncertainty is the cosmic 
variance. However, if there is noise or contamination 
we may not be able to measure precisely the individual 
multipole moments either. This introduces an additional 
level of uncertainty. For the power spectrum analysis 
such an uncertainty is automatically accounted for in the 
likelihood analysis, where sky-cuts and foreground un- 
certainties can be included in the likelihood calculations 
without determining the actual values of multipole mo- 
ments. For other more complicated statistics one must 
determine the probability distribution of the multipole 
moments first and then apply these to the statistic of 
interest. 



A. Probability distribution of the multipole 
moments 

In general determining the full probability distribution 
of multipole moments can be numerically challenging, as 
the multipole moments will be correlated and their num- 
ber will grow as the square of the number of considered 
multipoles. Here we focus on large scales only, where 
quadrupole and octopole dominate. Their joint prob- 
ability distribution function has 12 degrees of freedom. 
We use Markov Chain Monte Carlo in the form of the 
simplest Metropolis algorithm with a fixed Gaussian 
width proposal function. Since the likelihood evaluations 
are very fast this simple method is sufficient. We note 
that the posterior distribution p(ai m ) is gaussian by con- 
struction and can be fully constrained with just a few 
likelihood evaluations. This can considerably speed up 
the calculations, but is not really needed for our analysis 



where we focus on the lowest multipoles only. 

In order to make valid constraints on various realiza- 
tions of the quadrupole and octopole we need to devise 
a way of calculating the likelihood of a given a2 m s and 
ct3 m s in the presence of the instrumental noise, fluctua- 
tions due to the power in the higher multipoles and fore- 
ground contamination. In analogy with the common \ 2 
analysis, the likelihood of a given model is the probabil- 
ity that the residuals between the theoretical map for a 
given theoretical model (i.e. a map corresponding to a 
quadrupole and octopole) and the measured map are a 
possible noise realization. Therefore, the likelihood can 
be written as: 

logL = ~(d - d T ) T (C total )- 1 (d -d T )+D (1) 

where dx is the theoretical map data-vector 

(d-r)i = ^2 ^2 Yt m (ni)a hn , (2) 

e=2,3m=-i...i 

with rij being the unit vector in the direction of the i-th 
data-point and D is an unimportant constant. 

The covariance matrix C total is the total covariance 
matrix, which can be broken into several parts as follows: 

^itotal C -)- N -|- A(C dus ^ -|- £) s y nc h _|_ £jfree— free _|_ 

C i= + C e=i ) _ (3) 

The main contribution is the noise due to fluctuations 
in the multipoles higher than recovered ones: 

^ 2/4- 1 

C id = ~^ C "Mcos 9ij)B t , (4) 

where the sum starts at the lowest multipole which is 
not being recovered, Pi is the Legendre Polynomial of 
order /, Bg is the beam smoothing and #y is the angle 
between ith and jth point on the sky. We denote with N 
the instrumental noise matrix, which for I = 2, 3 is small 
compared to other sources of noise. 

The matrices C dust , C synch and C frcc - froe are fore- 
ground contamination matrices, given by 

Qforeground = LL t j (5) 

where L is the foreground template vector. As discussed 
in previous paper [4j , the linear modes that correlate with 
the foreground template vector are effectively marginal- 
ized out in the limit of A — > oo. This is not the only 
option if one has additional information on the proba- 
ble amplitude of these foregrounds, so in this paper we 
also try A = 1, corresponding to the case where multi- 
frequency information in WMAP does provide some in- 
formation on their amplitude, but with an error of order 
unity compared to the best fit value. Finally, we also 
marginalize the residual dipole and monopole in the map 
with A — * oo, since we have no external information on 
their value. 
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This method allows us to calculate the likelihood of 
a given realization of a quadrupole and octopole in the 
presence of power in higher multipoles, foregrounds and 
monopole/dipole contamination. We explore the likeli- 
hood surface by making steps in a random direction, at 
each step deciding on whether to accept the new event 
based on the likelihood ratio relative to previous event. 
This MCMC sampling of the likelihood surface results 
in a 12 dimensional probability distribution of multipole 
moments, which is consistent with the data and with the 
allowed level of foreground contamination. Each MCMC 
element consists of a realization of multipole moments 
defined on the uncontaminated and uncut sky, so one can 
apply any statistic of choice to a given realization with- 
out having to worry about contamination, noise or sky 
cuts and various effects associated with implied priors. 
By averaging over MCMC elements one is performing 
marginalization over the full probability distribution. A 
related method has been proposed in 0] and (lflj . 

It is important to realize that the inverse of the co- 
variance matrix is calculated just once and therefore the 
likelihood evaluation is a very fast process. In fact, the 
entire MCMC process takes a few hours on a modern PC 
workstation. The method is trivially extended to higher 
multipoles. If a map has N pixels, it is completely de- 
fined by the ^ max oc N 1 / 2 multipoles. Assuming the time 
required to calculate a candidate map scales as N (if 
we are moving in a specific a^ m direction) and that the 
MCMC chain is limited by the random walk rather than 
shot noise in the sample density, then a very favorable 
scaling of N 3 ^ 2 is obtained for reconstructing all mul- 
tipoles in the map. However, it is very likely that the 
method will be limited by the very complicated multi- 
modal likelihood shape in the case of higher multipole 
reconstruction. 

B. Choice of maps and foreground templates 

We focus on three increasingly conservative combina- 
tions to asses the stability of statistical inferences. All 
maps are smoothed using 5° FWHM beam and down- 
sampled to the nside=16 Healpix map. Our three 
datasets in order of increasing conservativeness are: 

• ILC dataset contains the full-sky ILC map. Noise 
is ignored and so are foreground contaminations. 
This dataset can be used to compare our results 
with the previously reported results and to asses 
whether the methods produces the expected re- 
sults, but is likely to be too contaminated for results 
to be reliable. 

• Wd dataset takes the W channel data, applies the 
Kp2 mask, subtracts the MEM derived Free-Free 
foreground and marginalizes over MEM derived 
Dust template (using A = 1 and A = oo). 

• Vdfs takes the V channel map, applies the Kp2 
mask, and marginalizes over all three foreground 



templates of Dust, Free-Free and Synchrotron emis- 
sion. 

We have run the MCMC process until at least 500000 
independent samples were obtained and discarded first 
10000 samples. The process is very fast and a large num- 
ber of samples ensures that the chain is well converged. 
We checked this by comparing results of the first quarter 
of samples with that of the last quarter. 



C. Priors and implied priors 

Multipole moments ai m are drawn from a Gaussian 
distribution with variance Cg. The values of ag m are in 
general complex, but the requirement that the observed 
sky is a real quantity demands that ai m — a\_ m . This 
requires that the imaginary part of ago vanishes and thus 
we are left with 2^+1 degrees of freedom. We introduce 
the symbol Dg to describe the "measured power spec- 
trum" on a given sky, 

^ = ^TiEi afo ™i 2 - (6) 

m 

Our MCMC process takes flat priors on the values of 
each of the imaginary and real components of a,£ m , which 
are wide enough so that they do not affect the posterior 
probability distribution p{ag m ). This, however, implies a 
non-flat prior on the derived probability distribution for 
D t . 

This can be calculated as follows. In the 21 + 
1 dimensional space spanned by (ae,o, \[2 Re {ae,i), 
\/2 Im (a^i), . . . ) the points of constant Dt lie on the 
hyper-sphere with radius R = {{21+ l)!)) 1 / 2 . The num- 
ber of states corresponding to the volume of the shell of 
thickness AR determines the number of available states 
and therefore the implied prior is given by 

Pim P iied(-D*)dA oc R 2e dR cx D e -idD. (7) 

In particular, the implied prior for the quadrupole is 
Pim P iicd(£>2) oc D 3 / 2 and for the octopole Pi m piicd(-C>3) oc 
D 5 / 2 . 

As usual in such analyses, if the data were perfect the 
assumed prior would be irrelevant, while in the opposite 
limit all information comes from priors. It seems however 
unlikely that the assumed priors would affect our results 
on alignments (discussed below), as long as the priors 
do not introduce any correlations between the multipole 
moments. Therefore, we take these implied priors into 
account when calculating the estimates of the statisti- 
cal significance of quadrupole and octopole (because a 
flat Cg prior is usually assumed in analyses of this kind), 
but not when assessing the alignment of quadrupole and 
octopole. 
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In the rest of this paper we will also use symbols Q 
and Dp which correspond to a more conventional normal- 
ization of Ce and Dp 



_ Cp£(£ + 1) 
2vr 

Dd{l+l) 
2tt 



(8) 
(9) 



D. 



Consistency check: measured versus true power 
spectrum 



The Dp (equation |SJ) is \ 2 distributed with 2£ + 1 de- 
grees of freedom, 



p{Dp\C t ) 



c, r [i + ~ 



Dp 

1 XP 1 Ce 



(10) 

However, we measure and assuming a flat prior on 
Cp the Bayes theorem says: 



p{C t \D t )Kp{Dt\C t ) 



(11) 



For an idealized experiment with full sky coverage and 
no foreground contamination or noise, the value of Dp 
can be determined precisely and therefore one is only 
constrained by the cosmic variance from Equations (|10|) 
and as discussed in the introduction. For a real 
experiment, one must marginalize over this uncertainty 



p(Cp 



p(C e \Dp)p(Dp)d P (Dp) 



(12) 



Given the probability distribution function for ap m al- 
lows one to calculate the probability distribution function 
for p(Dp) using 



p(Dp 



Er 



2£+ 1 



-P(aoo, ■ 



«) 



xda^o • ■ • dapp, (13) 



where I ma x is the maximum £ for which this distribution 
is recovered. Combining the expressions above gives us 
p(C t ). 

As mentioned in the introduction the various power 
spectrum determination procedures such as PCL (see e.g. 
[l4() and QML (see e.g. [13) estimators or the exact 
methods using matrix inversion already take into account 
the cosmic variance to derive p(Cp). As a consistency 
check we compare the p(C2) derived from our chains to 
that of the exact likelihood analysis performed in Q| . We 
do this in the following manner. We calculate the p(D2) 
using our MCMC chains. The inferred probability distri- 
bution is then multiplied by Z?~ 3 / 2 to take into account 
the effect of the implied prior effect as described in the 
section Hi CI Finally we numerically integrate Equation 
(|12|l to get the probability distribution p(Ca). The result- 
ing p(C2) for the Vdfs case is shown in Figure^ together 




FIG. 1: This Figure shows the probability distribution func- 
tion for the true quadrupole in Vdfs case, calculated using 
two methods. The solid line correspond the p(C2) distribu- 
tion function derived using MCMC chains, while the dashed 
line corresponds to the distribution derived using the exact 
likelihood calculation by matrix inversion. The two curves 
are in a good agreement indicating that the method is nu- 
merically robust. 



with the _p(C,2) obtained by calculating the exact likeli- 
hood using the matrix formalism. The two curves are in 
a good agreement. This confirms that the method works 
as expected. 



E. Fitting individual foregrounds 

We have also tested the behavior of the system if one 
takes the amplitudes of the foregrounds to be additional 
variables of the system. In this case, the marginalization 
is performed by explicit fit to the amplitudes of these 
foregrounds. We have taken the V frequency data, ap- 
plied the KP2 cut and used the standard foreground tem- 
plates used by the WMAP team: the 408MHz Haslam 
synchrotron radiation map jlfil| .H-o; map from |l7j as a 
tracer of free-free emission and the FDS dust template 
based on . We used these templates instead of MEM 
derived maps to make the inferred amplitudes as statis- 
tically independent as possible and thus avoid compli- 
cations associated with signal-noise correlations which 
become apparent when using MEM foreground maps. 
Additionally we imposed constraints that all amplitudes 
must be positive. The resulting 12+3 dimensional prob- 
ability space is explored using the standard MCMC pro- 
cedure. We show the resultant amplitudes in Figure |21 
In this plot, the amplitudes have been rescaled to aver- 
age to one as the absolute numbers are not important for 
our application (templates give only the flux ratio be- 
tween the pixels). The data determine the amplitude of 
the dust and free-free emissions fairly accurately, but the 
amplitude of the synchrotron is only poorly constrained 
and consistent with 0. This is what is expected, since the 
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FIG. 2: This figure shows the inferred amplitudes of fore- 
ground maps when used to fit the data. The x axis has been 
rescaled to average to one. See text for further discussion. 



synchrotron is not anticipated to be a major contaminant 
at the frequencies of the V channel. 

We have also investigated degeneracy directions which 
might exist between various templated amplitudes and 
the values of D2 and D3 . Template amplitudes are nearly 
completely uncorrelated between each other and with the 
multipole amplitudes. The only marginally significant 
correlation exists between the Dust amplitude and the 
value of D2. This is shown in Figure 01 



III. RESULTS 
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FIG. 3: This Figure shows the two-dimensional distribution 
of the probability on the D2 - Dust amplitude plane. Contours 
correspond to 1 and 2-sigma confidence limits. See text for 
further discussion. 



region actually reveals the probability distribution is a 
narrow Gaussian with FWHM of a few fiK 2 , consistent 
with the effect of finite pixelization, etc. The confidence 
limits on the two parameters widen with the inclusion of 
sky cuts and the increasing amount of foreground corre- 
lated modes being marginalised out. We note that Wd 
case is very similar for A = 1 and A — * 00. This is 
consistent with our findings in Section III El The former 
sets the uncertainty in the foreground amplitude to the 
O(l), while the latter effectively marginalises over the 
foreground amplitude probability given the data, which 
we have shown to be of the order O(l). Therefore, we 
will limit our discussion to A — > 00 in the rest of this 
paper. It is interesting to note that while including more 
uncertainty in foreground subtraction (Vdfs versus Wd) 
makes the distribution broader, it also pushes the median 
to lower values. 

We also plot the 2-dimensional distribution on the D2- 
D3 plane for the most conservative Vdfs case in Figure 
[3] D2 and D3 are nearly completely uncorrelated, in ac- 
cordance with previous findings. 



A. Distributions for D2 and D3 

In Figure^ we show the marginalized one-dimensional 
distributions for D2 and D3 for the three cases and the 
Wd case with A = 1. These show several interesting fea- 
tures. The ILC map gives values of ~ 195/iK 2 for the 
quadrupole and ~ 1050/iif 2 for the octopole, in a good 
agreement with results obtained previously on the same 
map using the QML estimator |2| ■ Zooming up into the 



B. How low are the quadrupole and octopole? 

Reference pj proposes two methods for assessing the 
statistical significance for the low quadrupole, assuming 
that the D2 distribution is known. The Bayesian esti- 
mate is concerned with the probability that C2 is greater 
that its concordant value (1150[iK 2 ) given the value of 
D 2 . This is just the integral of the p(C2) from its con- 



6 




100 200 300 400 



D 2 [/* K 2 ] 







r 








'17 




CO 




i" r r 


1 1 1 


d 


I 


1 1-J J 


i tt_ 


■bability 
).6 


I 

I 




i n 
i I ^ 

hi 




i r 




-ih '-i 


relative 
.4 


rr 




ill 

L H 




J J 

1 f r 




M ^ 


0.2 


\Y J 

— 1 1— 1 J 







600 800 1000 1200 1400 



D 3 [ M K 2 ] 

FIG. 4: This Figure shows the distribution for D2 and D3 for 
the following four cases: ILC (vertical lines), Wd (thin line), 
Wd with A = 1 (thick dashed line) and Vdfs (thick line). 



cordant value upwards and therefore our MCMC chains 
do not add new information, as the p(C 2 ) can already 
be obtained from the exact likelihood evaluation shown 
in Figure 5 of the reference Q. Here we highlight the 
fact that the likelihood at high values of true quadrupole 
is very slowly decreasing and as a result the probability 
depends on the adopted prior. We quote these results in 
the Table 1. 

Frequentist estimate is concerned with the probability 
that the value of D2 is as small or smaller than the ob- 
served D 2 given that the value of C2 takes its concordant 
value (1150/iif 2 ). We calculate a "frequentist" estimate 
integrating over the probability distribution for p(D 2 ). 
We emphasize that this is not a real frequentist estimate, 
since a true frequentist statistic is never a-posteriori. We 
have also corrected for the Bayesian implied prior on the 
values of D 2 as discussed in the Section Hi CI 

The results are shown in Table 2. We note that the val- 
ues are a factor of a few higher than the original WMAP 
estimate and do not depend much on which method 
we use. The ILC value of D 2 ~ is somewhat 
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FIG. 5: This Figure shows the two-dimensional distribu- 
tion of the probability on the D2-D3 plane for the Vdfs case. 
Contours correspond to one-sigma and two sigma confidence 
limits. 



higher, but consistent with the original WMAP value of 
154 ± 70/iK 2 [l^. We note, however, that the number 
quoted in the official power spectrum file is considerably 
lower, 12i^,K 2 . The probability for the frequentist esti- 
mate rises from 1% to about 3% with the larger value, 
which is already a major increase. Taking into account 
the non-negligible width in the probability distribution 
function for D 2 for the Wd and Vdfs does not change the 
probability significantly. Note that the WMAP value of 
123/iK 2 is perfectly acceptable for the more conservative 
Vdfs analysis, which marginalizes over 3 foreground tem- 
plates, but appears nearly excluded from Wd analysis. 
In other words, if one assumes W map with KP2 mask 
and assumes the foreground uncertainties are associated 
only with the dust template then the probability for the 
actual quadrupole on the sky D 2 to be below 123/iK 2 is 
extremely small, less than 0.01%. 

Octopole results show a similar pattern, except that 
octopole is not low compared to standard model. The 
ILC value of ~ 1050/iif 2 is in near perfect agreement 
with the concordance model value of around 1100 fiK 2 . 
This is changed into a broader distribution by the more 
conservative treatments, with the peak values close to 
the ILC value around HOO^lK 2 and the width of about 
150fiK 2 FWHM. Both distributions are similar, so the 
details of foreground marginalization do not appear very 
important here. However, the low octopole value re- 
ported by WMAP, 611 fiK 2 , is not within the allowed 
range even for the most conservative treatment and there 
is not a single MCMC sample that would give a value this 
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Table 1: This table shows the Bayesian estimates for the probability of the low quadrupole assuming a narrow (uniform between and 
2000 iiK 2 ) and a wide (uniform between and 10000 fiK 2 ) prior on the value of D2 . 



Dataset 


Narrow prior 


Wide prior 


p(D a ) = S(D 2 - 123/^) 


5.0% 


8.5% 


ILC 


9.1% 


16% 


Wd 


11% 


18% 


Vdfs 


8.8% 


15% 



Table 2: This table shows the "frequentist" estimates of the 
probability of low quadrupole. See text for further discussion. 



Dataset 


Frequentist probability 


p(D 2 ) = 5(D 2 - 123fiK z ) 


1.1% 


ILC: p(D a ) = 5{D 2 - 195/iif 2 ) 


3.0% 


Wd 


4.0% 


Vdfs 


3.1% 



low. It is not clear what the cause for this discrepancy 
with WMAP values is, but one possible culprit is the esti- 
mator used by WMAP, which is noisier than the optimal 
maximum likelihood estimator and which thus can lead 
to an estimate that differs significantly from the actual 
value 0. In conclusion, quadrupole is somewhat but not 
anomalously low, its probability is around 3-4%, while 
octopole is perfectly normal and there is essentially zero 
probability for it to be as low as WMAP reported value. 



IV. THE QUADRUPOLE AND OCTOPOLE 
ALIGNMENT 

In this section we will evaluate the statistical signif- 
icance of various statistical measures which have been 
used to claim the alignment between quadrupole and oc- 
topole. 

Once again, we would like to stress the a-posteriori 
nature of the estimators discussed below. In an ide- 
alized scientific experiment, estimators and methods to 
be used to distinguish between various theoretical mod- 
els are known before the results and thus the inferred 
constraints are objective in the sense that methods are 
"blind" with respect to the data. In practice, however, 
progress is often made by spotting patterns in the data 
which are not predicted by the theory. One must be 
careful, however, when interpreting results which are con- 
cerned with attempts to quantify the statistical signifi- 
cance of the spotted regularities. The estimators and 
methods used in the latter case are by construction biased 
towards showing a positive detection. In the remaining 
part of this paper we simply wish to address the sensitiv- 
ity of various estimators to the uncertainties caused by 
foreground subtraction and sky cuts. We do not try to 
asses the question of meaning of probabilities associated 
with various methods. 



A. Angular momentum dispersion 

Authors of have defined a unique axis for each mul- 
tipole motivated by quantum-mechanical considerations. 
For each £ the axis v is defined as one that maximizes 
the angular momentum dispersion (AMD) given by 

K = Y j m 2 \a lm \ 2 . (14) 

m 

By examining the absolute value of the dot-product 
between the AMD vectors for quadrupole and octopole 

d= |vf =2 • V£= 3 |, (15) 

one can asses the statistical significance for their align- 
ment. The absolute value takes into account the fact that 
these vectors are head-less (i.e. their negative also maxi- 
mize K) . It can be easily shown that the distribution for 
d is uniform between zero and one if AMD vectors are 
randomly distributed on the sky. 

We have implemented the code that finds the AMD 
vectors for quadrupole and octopole using the matrix 
transformations that can be found in the appendix D of 
[j| . We determine the maximum value of K by numerical 
maximization rather than calculating the values of K on 
a Healpix grid. Using values for azm and a?,m from p| we 
are able to reproduce their value of d — 0.986. 

When this method is applied to our MCMC chains we 
obtain probability distribution function for d. This prob- 
ability distribution function can, in principle, be used to 
compare the Bayesian evidence for aligned quadrupole 
and octopole model (d = 1) with a standard model (d 
being uniform between and 1). We plot our results 
in Figure These results are worthy some discussion. 
First, in the case of ILC map we get the value of d = 0.95 
which is consistent with p| and the small difference be- 
tween ~ 0.95 and ~ 0.986 is caused by the difference be- 
tween the WMAP ILC map and the full-sky TOH map. 
For the Wd case, we see that the moderately high val- 
ues of d are still preferred, but the distribution develops 
a large tail towards smaller values of d. It is interest- 
ing, however, that the very high values of d are strongly 
disfavored. Values as high as 0.98 are allowed, but the 
values of d ~ 1 seem to be strongly rejected. In the Vdfs 
case all evidence for high values of d seem to vanish. The 
probability for d > 0.98 is 0.11% for Wd and 0.5% for 
Vdfs, compared to 2% probability for the random dis- 
tribution. The corresponding numbers for d > 0.95 are 
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FIG. 6: The probability distribution function for the align- 
ment of quadrupole and octopole using the maximum angular 
momentum dispersion method. The value of 1 indicates fully 
aligned vectors, while a random model would correspond to 
uniform probability distribution. See text for further discus- 
sion. 

2.3% and 1.6%, compared to 5% probability if random. 
The probability for alignment exceeding these two values 
in the data is below what it would be in the complete 
absence of the data (random case). Thus the data do 
not show any evidence for alignment once the foreground 
uncertainties are included in the analysis. 

A more detailed inspection reveals that the quadrupole 
vectors remained fairly well defined, while the vectors for 
octopole developed a strong plane degeneracy. This de- 
generacy is responsible for the decrease in the statistical 
evidence for alignment. The main conclusion from this 
investigation is that the alignment is not robust against 
different treatments of foreground subtraction and that 
a complete alignment is strongly disfavored by the data. 

B. Multipole Vectors 

Another method to explore the alignments are the mul- 
tipole vectors |2fJ. It is based on the idea that every 
multipole of the order I is fully determined by I headless 
vectors v £ ' 1 such that 

i 

^7< ro (e)a te =#[[(^-e). (16) 

m i—1 



Pairs of these vectors can be used to form oriented 
areas, by taking a cross-product between them: 



w w=v^xv^. (17) 

We thus have one such "area" vector for the 
quadrupole and three for the octopole. If one takes 
the dot-products between w 2,1:2 (quadrupole) and w 3 ' 1 '-? 
(three octopole vectors) and orders them in decreasing 
magnitude one obtains three numbers denoted A\, A 2 
and A3. A recent paper [|| claims that these values 
are anomalously high, indicating the alignment between 
quadrupole and octopole. Since the multipole vectors are 
well defined only on full sky the analysis has been applied 
to full sky ILC maps, which may have residual contam- 
ination due to imperfect foreground subtraction. Our 
method is ideal to address these concerns in a systematic 
fashion. 

We have calculated the multipole vectors for our chains 
using the publicly available code 0. We used these vec- 
tors to obtain the probability distributions for Aj in all 
cases of our chains, as well as on a Monte-Carlo simu- 
lation of 150000 random realizations of the sky to get 
the null- hypothesis distribution for Ai. The results are 
plotted in Figure [7| Applying the method to the full 
sky TOH map we are able to reproduce the results in 
0. Note that the results are more significant for the 
dynamic quadrupole corrected than dynamic quadrupole 
uncorrected map (see for detailed description of the 
differences between these). All of the Ai values, and A3 
in particular, are high. Applying this analysis to ILC 
map we find similar, although somewhat less statistically 
anomalous, results. 

We investigate next the effect of foreground uncer- 
tainty on these statistics. It is clear from figure [7| that 
introducing these uncertainties significantly degrades the 
statistical significance. This is particularly clear for A3, 
which has a value in excess of 0.7 for TOH, while neither 
Wd not Vdfs probability distributions extend this high. 
The probability for A 3 > 0.7 is 0.09% for Wd and 0.02% 
for Vdfs, compared to 0.2% for the random case. Once 
again, adding the observations reduces the probability 
of an alignment (defined here as A3 > 0.7) relative to 
no observations. This is not very sensitive to the value 
we choose for alignment, i.e. we find similar effect for 
A3 > 0.6 and A3 > 0.5. Other parameters give similar 
results, as is clear from Figured We thus see no evidence 
that these parameters are anomalously high in the data 
compared to the random case, once the foreground un- 
certainties are accounted for. Thus, any evidence for the 
anomalously high value in the full sky map must come 
from the region that is strongly affected by the sky cuts 
or foreground subtraction. In other words, the evidence 
for the anomaly comes from the region that is most likely 
to be contaminated by foregrounds and so is not a robust 
feature in the data. 
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FIG. 7: The probability distribution for Ai measures of align- 
ment using multipole vectors. The top line shows the prior 
distribution for A3,A2,Ai (from left to right) and the po- 
sition of corresponding values for the dynamic quadrupole 
corrected (solid) and the dynamic quadrupole uncorrected 
(dashed) TOH maps Q. The next three rows correspond 
to WMAP ILC, Wd and Vdfs case respectively with thin line 
corresponding to the prior distribution and the thick line the 
distribution upon the addition of the data. The most likely 
point is normalized to one. 

C. Feature matching 

As a third example of alignment statistic we present a 
new method for determining alignment of the quadrupole 
and octopole. If the two are aligned, one expects that 
hot-spots and cold-spots match between the two. We 
thus examine the cross-correlation map created by mul- 
tiplying the quadrupole and octopole maps: 

T x = f e=2 x f e=3 . (18) 

These maps are created by first normalizing ai rn so 
that D2 = 1 and D3 = 1 (this is indicated by hats in the 
above equation). We do this, because we want to decou- 
ple the multipole amplitude from the alignment effect. 
The integral of this map across the sky necessarily gives 




FIG. 8: Two examples of a quadrupole (left) and octopole 
(right) realizations of a random sky that have particularly 
high value of the I parameter. 




10 15 20 



FIG. 9: The probability distribution for the / parameter 
for the ILC, Wd and Vdfs cases respectively. The thin line 
corresponds to the prior distribution, while the thick line to 
the distribution upon the addition of the data. The most 
likely point is normalized to one. 
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zero due to the orthogonality condition: 

J T x dA = 0. (19) 

However, the integral of the cross-correlation map gives 
a non-zero result 

I = J(T*)dA, (20) 

which is particularly high if the cold-spots and hot-spots 
match between the two. This is illustrated in Figure |H| 
where we show two examples of a random sky realiza- 
tion with a particularly high value of the / parameter. 
The figure indicates that the method indeed captures 
the heuristic idea of aligned quadrupole and octopole in 
terms of hot and cold spot overlap. 

We calculate the value of I parameter for all three 
cases and also a large number (150000) of Monte Carlo 
simulations of a random sky to determine the prior dis- 
tribution of the I parameter. The results are shown in 
Figure |5] The value for the / parameter in the case of 
the ILC map is again very high, with the random proba- 
bility of being of that value or higher of only 0.9%. Once 
again this anomaly disappears in the case of the more 
conservative approaches, suggesting that the alignment 
is in the region most contaminated by the foregrounds. 



D. Alignment with the Ecliptic pole 

As a final example of the application of our method 
we investigate the asymmetry in the WMAP data on 
two hemispheres separated by the ecliptic plane. There 
were several reports of detection of this asymmetry us- 
ing various methods 

U H El The method most 
relevant for us is the angle between multipole w vectors, 
denned in the Equation 1171 and the ecliptic north pole 
6]. This test is performed by calculating the dot product 
of the w ' lJ vectors with the north ecliptic pole. We per- 
form this multiplication for the quadrupole and call the 
resulting quantity B 2 . When the same is performed for 
the octopole, we sort the values in ascending order and 
call them B31, B 32 and B 33 . Results of these analyses 
for our MCMC chains are shown in Figures Hill and ITTI 

We see that these results are considerably more stable 
with respect to the sky cuts and foreground marginaliza- 
tion than alignments discussed above. In order to asses 
this, we introduce three models: 

• NULL model assumes that B parameters are dis- 
tributed according to the random sky hypothesis 

• ALIGN1 model assumes that B 3 i < 0.02 and B 32 < 
0.02. This is effectively equivalent to the assump- 
tion that -B31 = B32 = 0, but with the advantage 
that its Bayesian evidence can be calculated from 
the existing chains. 



• ALIGN2 model assumes B 31 < 0.02, B 32 < 0.02 and 
additionally B 2 < 0.02. 

These models are, of course a posteriori. In fact, there 
are 16 possible models similar to ALIGN1 and ALIGN2 in 
which one or more B arc anomalously low. Therefore, 
interpretation of any result coming from the above should 
consider the the 1 in 16 factor coming from the biased 
choice of models. If we take into account the fact that 
one is working with the derived w vectors rather than 
the source multipole vectors v the biased choice "factor" 
becomes even higher. 

Nevertheless, we asses the probability of this occur- 
ring by chance using two methods. Firstly, we measure 
the ratio of probabilities of each model given data to its 
probability in the isotropic case and secondly we calcu- 
late the Bayesian evidence for all three models for the 
Wd and Vdfs case. Bayesian evidence is the probability 
of a given model in the Bayesian context, assuming all 
models to have the same prior probability. It is the likeli- 
hood integrated over the prior volume and thus disfavors 
models that have either low likelihood or unnecessarily 
large number of parameters (leading to large volume of 
parameter s pac e having low likelihood). For further dis- 
cussion see [22j . See Appendix El for discussion of our 
method for estimating evidence. Results are shown in 
Table 3. The two methods are different in the way how 
the NULL model is treated (i.e. the -Enull is sensitive 
to how well the data describe the isotropic model), but 
they nevertheless give similar results. While ILC does 
not show evidence for alignment as we defined it, the ev- 
idence for the alignment of the quadrupole and octopole 
planes with the ecliptic increases with the addition of 
the sky-cuts and foregrounds. This can happen if, for 
example, the real signal is contaminated by foregrounds, 
which destroy its evidence, but once the contamination 
is removed the signal comes out again. In Vdfs ALIGN1 
model is 42 times more likely than the random model 
(which corresponds to ~ 2.5a). As discussed above, how- 
ever, once the biased nature of the models is taken into 
account, the real statistical significance is much smaller. 
The main point of interest here is that the alignment with 
ecliptic is less sensitive to foreground contamination than 
other alignments discussed above. We also calculated the 
dot product of the ecliptic north pole with the Angular 
Momentum Dispersion vectors, but the results did not 
show any evidence for alignment. 



V. CONCLUSIONS 

We developed a method to incorporate the uncer- 
tainties in the foreground removal and/or sky cuts into 
the statistical analyses that otherwise require a full sky 
data. Our method is based on deriving the full multi- 
dimensional likelihood distribution of the multipole mo- 
ments given the data and the foreground uncertainties, 
noise or sky cuts. We compute this distribution using 
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Table 3: This table shows the two statistical tests used to asses the statistical evidence for ALIGN1 and ALIGN2 models as discussed in the 



text. 


Dataset -Ealigni/^null 


p(ALIGNl|data) 


-Balign2/-Enull 


p(ALIGN2|data) 


p(ALIGNl| isotropic) 


p(ALIGN2 isotropic) 


ILC < 1(T J 
Wd 2.8 ± 0.04 
Vdfs 41.7 ± 0.2 


4.3 ±0.08 
63.9 ± 0.3 


< 10" J 
19.3 ± 1 
16 ± 1.2 


19.0 ±1 
15.6 ±1 



0.2 0.4 0.6 o.; 



0.2 0.4 



0.8 











FIG. 11: The probability distribution functions for B^i (left 
to right) for the ILC (top), Wd (middle) and Vdfs (bottom) 
datasets. The most likely point is normalized to one. 



FIG. 10: The probability distribution for the B2 parameter. 
The prior distribution is flat. See Figure fTTI for the distribu- 
tion of the Bn parameters. The panels correspond to ILC 
(top), Wd (middle) and Vdfs (bottom). 



the MCMC sampling of the likelihood function, which 
is very fast, since we only need to invert the covariance 
matrix once. We recommend these methods are used to 
assess the statistical significance of an effect found in the 
full sky maps, to verify the sensitivity of the result to 
foreground uncertainties. 

As an application of the method we have performed 
a detailed analysis of various quadrupole and octopole 
statistics in the WMAP data. For our standard proce- 
dure we use three datasets, ranging from the most aggres- 
sive (and probably contaminated) full sky ILC/TOH map 
to the V channel with applied KP2 mask and foreground 
marginalization of all three major contaminants, namely 
the Dust, Free-Free and Synchrotron emission. We used 



MCMC method to determine the probability distribution 
for the realization of the 12-dimensional quadrupole and 
octopole multipole moments. 

We wish to emphasize again that probabilities of any 
a-posteriori statistic are always subjective (and some are 
more subjective than others, since they were more tai- 
lored for the data at hand). For this reason we focus 
here on the relative change in the probabilities between 
the least conservative ILC full sky map analysis and the 
most conservative map with galactic sky cut and fore- 
ground marginalization. The motivation is that if the 
probability changes strongly between these cases then it 
is not robust and may be contaminated by imperfect fore- 
ground removal in ILC/TOH map. This is of course not 
the only possibility: it is always possible that the data 
happen to contain important statistical information in 
the region most contaminated by the foregrounds. But 
our method provides some objective estimate of the prob- 
ability for this to happen. 

A point related to this is the question of how to for- 
mulate a statistically meaningful quantity that addresses 
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various claims in the literature. For example, one could 
test perfect alignment between quadrupole and octupolc 
(d = 1), but a somewhat less perfect alignment could 
also be of potential significance as a sign of an anomaly. 
We address this by computing the integrated probability 
for a statistic to exceed a certain value and compare it 
to the random case. As the chosen value approaches the 
limit (e.g. d = 1) the probability for the random case be- 
comes very small, while if the data show signs of anomaly 
the integrated probability from the real data will remain 
significant. The ratio of the two thus gives some infor- 
mation on whether there is anything anomalous in the 
data. 

We analyze the evidence for the low quadrupole and 
octopole. These are not particularly unlikely from ILC 
map, with the quadrupole probability being 3% and oc- 
topole being quite typical. These numbers do not change 
much by our more conservative treatment, so they ap- 
pear robust. They are however significantly higher than 
the original values quoted by WM AP [2^] , which are sta- 
tistically excluded and are therefore not a result of dif- 
ferences in foreground modeling. In a certain sense we 
have reached the opposite conclusion: from our analysis 
it appears impossible for the octopole to be anomalously 
low and the same is true for quadrupole in our Wd anal- 
ysis. The difference may be a consequence of the noisier 
estimator used by WMAP. 

We also discuss recent claims that the quadrupole and 
octopole are aligned. If one believes the ILC map, then 
the evidence for the quadrupole and octopole alignment 
is considerable. All three methods tested here, namely 
the maximum angular dispersion vectors, the multipolc 
vectors and the feature matching method indicate that 
the two are suspiciously aligned. However, as soon as 
foreground uncertainties are included the evidence for 
this alignment disappears. It is not unexpected that the 
probability distributions broaden, but what is surprising 
is how rapidly the evidence vanishes and how strongly 
perfect or even partial alignment is excluded by the data. 
This strongly suggests that much of the evidence of the 
alignment comes from the portion of the data most con- 
taminated by the galactic foregrounds. 

Plots shown in this paper indicate a considerable dif- 
ference between Wd and Vdfs cases. It is interesting to 
explore whether this difference comes from the number 
of templates being marginalised over or whether they are 
due to different frequency channels being employed. To 
investigate this we repeat the analysis using W channel 
data and marginalise over all three templates. The re- 
sults are very similar to the Wd case, indicating that 
the main difference is due to different channel being em- 
ployed. This suggests there might be additional system- 
atic effects that are not handled properly even by our 
more conservative treatment. One possibility is that ei- 
ther V or W channel MEM derived foreground maps are 
contaminated on the largest scales, or that there are sys- 
tematic contaminations in CMB maps Q . More work is 
needed to explore these various possibilities. 



Finally, we also present an example where our method 
can enhance the statistical significance by removing the 
contamination which would otherwise mask the evidence. 
We show that for the alignment of multipole vectors with 
ecliptic plane the statistical significance is not lowered 
by the foreground uncertainties. Once again, the statis- 
tical significance of this effect is unclear, but at least it 
seems clear that it is not significantly affected by the fore- 
grounds and may even be enhanced, once the foreground 
contamination is removed from the data. 

Our method is statistical and relies on the foreground 
templates to be faithful representation of all components 
that contaminate the data. Systematic uncertainties in 
the foregrounds translate in systematic uncertainties in 
derived quantities. Although our method provides a sta- 
tistical framework for assessing the statistical significance 
of various effects, the real improvement will come from 
better understanding and modelling of the foregrounds. 
This goal can be achieved using multi-frequency data 
combined with a better understanding of physical pro- 
cesses involved. In this case a less conservative treatment 
of the foregrounds may be possible. We have tried some 
of these examples in our tests. Analyzing the original V 
or W maps without sky cuts, but with foreground tem- 
plate marginalization, causes MCMC sampler not to con- 
verge, which is indicative of a complex likelihood in this 
case. Similarly, using sky cuts but with no foreground re- 
moval shows clear evidence of contamination and causes 
the quadrupole and octopole to increase significantly Q . 
Subtracting the WMAP recommended foregrounds and 
using A = 1 in equation [3] gives results very similar to 
A = oo. Thus we believe our results reflect the current 
uncertainties in the foreground subtraction and suggest 
these may be responsible for many of the anomalies seen 
in the WMAP data on large scales. 
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APPENDIX A: EVIDENCE CALCULATION 

The Bayes equation for the posterior probability of set 
of parameters 6 given a data-vector d can be written as 



P(0\d) = 



P(d\0)P{8) 
P(d) 



(Al) 
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The denominator of the right hand side is the evidence. 
See [2^ for a discussion of the usage of evidence as a 
Bayesian tool for model selection. Since the posterior 
must integrate to unity, one can write 

E = P(d) = f P(d\d)P(e), (A2) 
Je 

i.e. the evidence is the likelihood integrated over all pri- 
ors. Evidence for completely disjoint models is usually 
calculated by the thermodynamic integration durin g th e 
burn- in phase of the MCMC sampling (see e.g. |24|). 
When considered models are a subset of the most gen- 
eral model, it is possible to calculate the relative evidence 
from the MCMC samples of the most general model. We 
perform the integral of the Equation (|A2I) in bins that are 



0.02 in size over B2, -B31, B^- The integral is thus per- 
formed by adding up the prior probability corresponding 
to each sample. The prior probability is approximated to 
be constant over the entire bin. For the NULL model, the 
prior probability is obtained by binning the Monte Carlo 
simulations of random skies on the grid and normalizing. 
For the ALIGN1 and ALIGN2 models, the prior probabil- 
ity is one in the corresponding bin (i.e. B 3 i < 0.02 and 
B32 < 0.02 and zero otherwise). The prior probability for 
B 2 is constant at 0.02 for models ALIGN1 and NULL. For 
the ALIGN2 it is one in the first bin and zero otherwise. 

The error on the evidence are assumed to be only due 
to Poisson error in the number of samples in a given bin. 
If chains are well converged, this should indeed be the 
dominant error. 
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